17 research outputs found

    Multimodal Visual Concept Learning with Weakly Supervised Techniques

    Full text link
    Despite the availability of a huge amount of video data accompanied by descriptive texts, it is not always easy to exploit the information contained in natural language in order to automatically recognize video concepts. Towards this goal, in this paper we use textual cues as means of supervision, introducing two weakly supervised techniques that extend the Multiple Instance Learning (MIL) framework: the Fuzzy Sets Multiple Instance Learning (FSMIL) and the Probabilistic Labels Multiple Instance Learning (PLMIL). The former encodes the spatio-temporal imprecision of the linguistic descriptions with Fuzzy Sets, while the latter models different interpretations of each description's semantics with Probabilistic Labels, both formulated through a convex optimization algorithm. In addition, we provide a novel technique to extract weak labels in the presence of complex semantics, that consists of semantic similarity computations. We evaluate our methods on two distinct problems, namely face and action recognition, in the challenging and realistic setting of movies accompanied by their screenplays, contained in the COGNIMUSE database. We show that, on both tasks, our method considerably outperforms a state-of-the-art weakly supervised approach, as well as other baselines.Comment: CVPR 201

    Pre-training Music Classification Models via Music Source Separation

    Full text link
    In this paper, we study whether music source separation can be used as a pre-training strategy for music representation learning, targeted at music classification tasks. To this end, we first pre-train U-Net networks under various music source separation objectives, such as the isolation of vocal or instrumental sources from a musical piece; afterwards, we attach a convolutional tail network to the pre-trained U-Net and jointly finetune the whole network. The features learned by the separation network are also propagated to the tail network through skip connections. Experimental results in two widely used and publicly available datasets indicate that pre-training the U-Nets with a music source separation objective can improve performance compared to both training the whole network from scratch and using the tail network as a standalone in two music classification tasks: music auto-tagging, when vocal separation is used, and music genre classification for the case of multi-source separation.Comment: 5 pages (4+references), 3 figures. ICASSP-24 submissio

    Multi-Source Contrastive Learning from Musical Audio

    Full text link
    Contrastive learning constitutes an emerging branch of self-supervised learning that leverages large amounts of unlabeled data, by learning a latent space, where pairs of different views of the same sample are associated. In this paper, we propose musical source association as a pair generation strategy in the context of contrastive music representation learning. To this end, we modify COLA, a widely used contrastive learning audio framework, to learn to associate a song excerpt with a stochastically selected and automatically extracted vocal or instrumental source. We further introduce a novel modification to the contrastive loss to incorporate information about the existence or absence of specific sources. Our experimental evaluation in three different downstream tasks (music auto-tagging, instrument classification and music genre classification) using the publicly available Magna-Tag-A-Tune (MTAT) as a source dataset yields competitive results to existing literature methods, as well as faster network convergence. The results also show that this pre-training method can be steered towards specific features, according to the selected musical source, while also being dependent on the quality of the separated sources.Comment: 8 pages, 5 figures, 3 tables. (Slightly edited) submission at SMC2

    Music signal processing with application to recognition

    No full text
    This thesis lays in the area of signal processing and analysis of music signals using computational methods for the extraction of effective representations for automatic recognition. We explore and develop efficient algorithms using nonlinear methods for the analysis of the structure of music signals, which is of importance for their modeling. Our main research directions deals with the analysis of the structure and the characteristics of musical instruments in order to gain insight about their function and properties. We study the characteristics of the different genres of music. Finally, we evaluate the effectiveness of the proposed nonlinear models for the detection of perceptually important music and audio events.The approach we follow contributes to state-of-the-art technologies related to automatic computer-based recognition of musical signals and audio summarization, which nowadays are essential in everyday life. Because of the vast amount of music, audio and multimedia data in the web and our personal computers, the use of this study could be shown in applications such as automatic genre classification, automatic recognition of music’s basic structures, such as musical instruments, and audio content analysis for music and audio summarization.The above mentioned applications require robust solutions to information processing problems. Toward this goal, the development of efficient digital signal processing methods and the extraction of relevant features is of importance. In this thesis we propose such methods and algorithms for feature extraction with interesting results that render the descriptors of direct applicability. The proposed methods are applied on classification experiments illustrating that they can capture important aspects of music, such as the micro-variations of their structure. Descriptors based on macro-structures may reduce the complexity of the classification system, since satisfactory results can be achieved using simpler statistical models. Finally, the introduction of a ‘‘music’’ filterbank appears to be promising for automatic genre classification.Η διδακτορική αυτή έρευνα ασχολείται με το θέμα της ψηφιακής επεξεργασίας μουσικών σημάτων και την ανάλυσή τους με υπολογιστικές μεθόδους με στόχο την εξαγωγή χρήσιμης πληροφορίας για την αναγνώρισή τους. Συγκεκριμένα μελετάμε και αναπτύσσουμε αποτελεσματικούς αλγορίθμους, με τη χρήση μη-γραμμικών μοντέλων, για την επεξεργασία των σημάτων μουσικής, την κατανόηση μουσικών φαινομένων και τη μοντελοποίηση τους. Εστιάζουμε στη διερεύνηση και την ανάλυση των σχέσεων μεταξύ των μουσικών οργάνων για την κατανόηση της λειτουργίας και των χαρακτηριστικών τους. Εξετάζουμε τα γνωρίσματα των διαφορετικών ειδών μουσικής, ενώ επιπλέον αξιολογούμε την αποτελεσματικότητα των μη-γραμμικών μοντέλων για την ανίχνευση σημαντικών μουσικών και γενικά ακουστικών γεγονότων.Η ανάλυση αυτή συνεισφέρει στην έρευνα και στην τεχνολογία αιχμής που σχετίζεται με την αυτόματη κατηγοριοποίηση μουσικής μέσω των διαφορετικών αυτών πλαισίων, αλλά και στη δημιουργία περιλήψεων των ηχητικών σημάτων. Τέτοιες εφαρμογές στις μέρες μας συναντώνται ευρέως σε εφαρμογές από το λογισμικό υπολογιστών έως τα κινητά τηλέφωνα τρίτης γενιάς. Λόγω της πληθώρας των ηχητικών, μουσικών, αλλά και πολυμεσικών δεδομένων, η χρησιμότητα της μελέτης αυτής διαφαίνεται σε εφαρμογές όπως η αυτόματη αναζήτηση μουσικής με βάση το είδος, η αναγνώριση βασικών δομών της μουσικής, όπως για παράδειγμα τα μουσικά όργανα, και η δημιουργία περιλήψεων.Με βάση το πλαίσιο αυτό προτείνουμε νέα χαρακτηριστικά για τη μοντελοποίηση των σημάτων μουσικής. Η πειραματική αξιολόγηση τους τεκμηριώνει τη δυναμική των μεθόδων που ακολουθούμε καθώς τα αποτελέσματα παρουσιάζονται ιδιαίτερα ενθαρρυντικά. Συγκεκριμένα, η έρευνα αυτή δείχνει πως τα προτεινόμενα χαρακτηριστικά δύνανται να περιγράψουν σημαντικά φαινόμενα των μουσικών σημάτων όπως για παράδειγμα τις μικρο-μεταβολές των δομών τους. Επιπλέον, αναπαραστάσεις που βασίζονται στις μακροδομές των σημάτων επιφέρουν μείωση της πολυπλοκότητας του συστήματος κατηγοριοποίησης, εφόσον ικανοποιητικά αποτελέσματα επιτυγχάνονται με απλούστερα στατιστικά μοντέλα. Τέλος, η εισαγωγή ιδεών όπως η «μουσική» συστοιχία φίλτρων επιδεικνύει ιδιαίτερη διακριτική ικανότητα στην κατηγοριοποίηση των μουσικών σημάτων

    Musical instruments signal analysis and recognition using fractal features

    No full text
    Analyzing the structure of music signals at multiple time scales is of importance both for modeling music signals and their automatic computer-based recognition. In this paper we propose the multiscale fractal dimension profile as a descriptor useful to quantify the multiscale complexity of the music waveform. We have experimentally found that this descriptor can discriminate several aspects among different music instruments. We compare the descriptiveness of our features against that of Mel frequency cepstral coefficients (MFCCs) using both static and dynamic classifiers, such as Gaussian mixture models (GMMs) and hidden Markov models (HMMs). The methods and features proposed in this paper are promising for music signal analysis and of direct applicability in large-scale music classification tasks. 1
    corecore